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Abstract 


Parallelization  can  be  achieved  by  either  control  or  domain  decomposition.  The 
latter  was  tried  for  analytic  (by  Phipps  et  and  semianalytic  (by  Wallace^®) 

propagators.  Neal  and  Coffey^'*  discuss  the  domain  decomposition  for  special  perturba¬ 
tions.  The  control  decomposition  idea  is  inefficient  for  analytic  propagators  (Phipps^®), 
because  the  computation  time  is  too  short.  In  this  report  we  discuss  a  control  de¬ 
composition  approach  to  parallelize  a  numerical  orbit  propagator  which  may  be  more 
computationally  intensive. 
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1  Introduction 


The  orbit  of  an  earth  satellite  may  be  predicted  by  solving  the  differential  equations  of 
Newtonian  mechanics.  The  various  approximate  methods  for  solving  the  differential  equation 
in  this  case  may  be  divided  into  three  categories:  analytical,  semianalytical,  and  numerical. 

Within  the  next  few  years  the  military  expects  to  increase  the  number  of  objects  cataloged 
and  to  require  more  accurate  predictions.  With  a  commensurate  improvement  in  sensor  data, 
the  accuracy  of  the  predictions  could  be  improved  to  the  order  of  tens  of  meters  (depending 
on  time  from  epoch)  with  a  semi-analytical  theory  or  to  meters  with  a  completely  numerical 
solution.  However,  these  improvements  would  be  obtained  at  the  expense  of  a  great  increase 
in  computation  time  using  a  numerical  method  on  a  serial  computer. 

Fonte  et  aF  have  compared  the  above  three  categories  of  propagators  as  implemented 
in  R&D  GTDS®  in  various  orbital  regimes.  It  was  shown  that  accuracy  of  the  Draper 
semianalytic  (DSST)  propagator  when  used  with  “optimal”  parameters  is  close  to  that  of 
the  numerical,  but  the  CPU  time  required  is  much  less.  Testing  with  real  data  is  required 
to  convince  the  users  to  adopt  DSST.  Also  such  a  move  requires  training  of  potential  users. 
Thus  the  most  cost  effective  way  to  get  more  accurate  predictions  for  more  orbiting  objects  in 
a  short  amount  of  time  would  require  the  use  of  a  parallel  version  of  a  numerical  propagator. 

In  the  next  section  we  discuss  parallel  computing  as  applied  to  orbit  propagation.  In 
section  3,  numerical  propagators  will  be  discussed.  The  experience  gained  with  a  control 
decomposition  version  of  special  perturbations  orbit  propagator  is  described  in  section  4. 


2  Parallel  Computing 

Parallel  computing  is  defined  eis  the  efficient  form  of  information  processing  emphasizing 
the  concurrent  computations  and  manipulation  of  data  to  solve  a  single  problem  (see  e.g. 
Hwang  and  Briggs^°).  Parallel  computers  may  be  classified  according  to  their  architecture. 
Flynn^  has  introduced  a  scheme  to  classify  computers  into  four  categories  based  on  the 
multiplicity  of  instruction  and  data  streams.  The  serial  computers  are  called  SISD  (Single 
Instruction  Single  Data).  Array  processors  are  called  SIMD,  and  most  multiprocessors  are 
MIMD.  Another  way  of  classification  is  by  topology. 

Parallel  computing  offers  one  option  to  decrease  the  computation  time  and  achieve  more 
real-time  results.  Use  of  parallel  computers  has  already  proven  to  be  beneficial  in  reducing 
computation  time  in  many  other  applied  areas. 

Two  common  measures  of  effectiveness,  accounting  for  both  the  hardware  and  the  algo¬ 
rithm  are  speedup  and  efficiency.  The  speedup,  Sp,  of  an  algorithm  is  defined  as 

Sp  =  7fr  or  TfT  (1) 

J-  p  J-  p 


3 


where  Tg  is  the  time  on  a  serial  computer  and  T;  is  the  time  on  a  parallel  computer  having 
i  processors.  The  efficiency,  Ep,  is  defined  by 

^  (2) 

P 

and  it  accounts  for  the  relative  cost  of  achieving  a  specific  speedup,  many  factors  could 
possibly  limit  the  efficiency  of  a  parallel  program.  These  factors  include  the  number  of  se¬ 
quential  operations  that  cannot  be  parallelized,  the  communication  time  between  processors, 
and  the  time  each  processor  is  idle  due  to  synchronization  requirements,  see  e.g.  Quinn^^. 


2.1  Hypercube 

The  iPSC/2  INTEL  hypercube^^  is  a  MIMD  multicomputer  with  a  hypercube  topology.  The 
computer  consists  of  a  system  resource  manager  (host)  and  2”  individual  processors,  called 
nodes  {n  is  the  dimension  of  the  cube).  The  computing  nodes  may  be  augmented  by  a  vector 
extension  module  for  vector  operations.  Communication  among  the  nodes  and  the  host  are 
completed  through  message  passing.  See  Figure  1  for  hypercubes  with  various  dimensions 
n. 


Figure  1:  Hypercubes  with  dimension  n  =  0, 1,2, 3, 4 
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2.2  Parallel  Virtual  Machines 


Parallel  Virtual  Machine  (PVM)  is  a  small  (~1  Mbytes  of  C  source  code)  software  package 
that  allows  a  heterogeneous  network  of  Unix-based  computers  to  appear  as  a  single  large 
distributed-memory  parallel  computer.  The  PVM  package  is  good  for  large-grain  parallelism; 
that  is,  as  least  100  kbytes/node.  The  term  virtual  machine  is  used  to  designate  a  logical 
distributed-memory  computer  and  host  is  used  to  designate  one  of  the  member  computers. 

The  PVM  software,  developed  at  Oak  Ridge  National  Laboratory  (see  Dongara  et  aU  and 
Sunderam  et  aU"^)  supplied  the  functions  to  automatically  start  up  tasks  to  communicate  and 
synchronize  with  each  other.  A  problem  can  be  solved  in  parallel  by  sending  and  receiving 
messages  to  accomplish  multiple  tasks,  similar  to  send  and  receive  on  the  hypercube. 

PVM  handles  all  message  conversion  that  may  be  required  if  two  computers  use  different 
data  representations.  PVM  also  ensures  that  error  messages  generated  on  a  remote  computer 
axe  displayed  on  the  user’s  local  screen. 

Parallelization  could  have  been  accomplished  using  a  specific  parallel  multicomputer, 
such  as  the  INTEL  hepercube^^.  These  systems  tend  to  be  large  and  expensive.  While  PVM 
may  not  accomplish  the  tasks  as  fast  as,  say,  an  INTEL  iPSC/2  hypercube,  the  process 
execution  times  were  satisfactory  for  the  application  tested. 


2.3  Decomposition  Strategies 

Given  a  program  and  its  associated  data  set,  there  are  two  primary  ways  to  process  it  in 
parallel.  The  program  can  be  separated  into  individual  sections  (called  control  decomposi¬ 
tion)  with  a  processor  dedicated  to  compute  its  respective  part,  much  like  a  factory  assembly 
line.  The  other  method  domain  decomposition  is  to  divide  up  the  data  set  and  send  parts 
to  many  separate  processors  all  running  the  same  algorithm,  but  on  different  data. 

Figure  2  graphically  presents  these  relationships  between  the  node  distributing  data,  the 
node  collecting  results  and  the  workers. 

In  1992,  the  first  result  on  parallelization  of  orbit  propagators  was  obtained  by  our 
student  (W.  E.  Phipps^®’^°’^*).  These  results  were  presented  at  the  1993  Space  Surveillance 
Workshop  at  M.I.T.  Lincoln  Laboratory^®.  During  the  past  five  years  a  similar  idea  (domain 
decomposition)  was  applied  to  the  analytic  propagators  SGP  and  SGP4/SDP4  (see  Ostrom^®, 
Brewer^,  Neta  et  aP^,  and  Stone^^).  Our  students  developed  a  model  to  find  the  optimal 
number  of  processors  (in  the  sense  that  the  algorithm  is  most  efficient).  This  optimal  number 
depends  on  the  satellite  motion  model  used,  the  number  of  objects  and  the  number  of  calls  to 
the  propagator  for  each  object.  Ostrom^*  and  Neta  et  aP^  have  shown  that  one  can  achieve 
near  100%  efficiency.  Wallace^®  suggested  the  same  idea  for  the  semianalytical  propagator 
DSST.  Neal  and  Coffey^^  demonstrated  how  to  maintain  the  space  catalog  using  similar 
parallelism  idea  for  special  perturbations. 

The  control  decomposition  idea  for  analytic  propagators  is  inefficient  (as  demonstrated 
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Figure  2:  Domain  decomposition 


by  Phipps*®)  since  the  analytic  propagators  are  not  computationally  intensive. 

In  the  next  section,  we  discuss  the  possible  numerical  orbit  propagators  and  their  parallel 
version. 


3  Numerical  Propagators 


Neta  and  Lustman  have  developed  a  parallel  numerical  ODE  solvers  for  both  linear*^  and 
nonlinear  systems*^.  The  idea  here*^  is  to  use  extrapolation.  One  can  use  Euler’s  or  Gragg’s 
method  to  solve  the  system 


subject  to 


to  <t<b 


yiU)  ^  yo 


(3) 

(4) 


For  example,  Gragg’s  method 


^1/2  —  yo  +  |/(^o,  yo) 
yi  =  yo  +  hf{tif2,zii2) 


(5a) 


^n+l/2  ^n— 1/2  {tn-,  yn^ 

<  n  =  1,2,  •  •  •  (56) 

yn+l  —  yn  +  hf{tn+l/2,Zn+l/2) 
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has  truncation  error 


(6) 


t/n  -  y{nh)  =  B2h'^  +  H - 

Each  one  of  the  N  processors  uses  an  ODE  solver  with 

N 

hr  =  -H,  (7) 

thus  the  common  points  are  tj  =  to  +  (j  —  1)NH. 

Given 

{hr,y{ti,hr)\r  =  0,l,---,N-l-,  t  = 

the  solution  at  those  M  points  in  (to,  b)  is  computed  by  the  same  scheme  by  all  N  possible 
h's).  For  polynomial  extrapolation  we  construct  a  table  of  values  Trs  as  follows 


TrO 

Trs 


y{tii  hr) 
Tr+ls-1  + 


^  r-f-ls 


-i-T, 


rs-1 


itr+s 


s  =  1 


(8) 


Extrapolation  will  yield  0{h'^^)  accuracy  for  Gragg’s  scheme. 

In  the  following  table  we  show  which  processor  computes  which  part  of  the  solution. 


processor 

step 

proc  1 

proc  2 

proc  3 

proc  4 

proc  5 

proc  6 

Too 

1 

Toi 

Tio 

2 

AH 

Til 

To2 

To3 

T20 

3 

oolco 

T21 

Ti2 

Ti3 

To4 

To5 

T30 

4 

2H 

T31 

T22 

T23 

Ti4 

Ti5 

Toe 

T40 

5 

OOllO 

T41 

T32 

T33 

T24 

T25 

Tie 

T50 

6 

Tsi 

T42 

T43 

T34 

Too 

7 

IH 

Tei 

T52 

T70 

8 

H 

Table  : 

1:  Extrapolation 

assigned  to  each  processor 

proc  7 


The  efficiency  of  this  algorithm  (based  on  low  order  integrator  and  extrapolation)  is  over  75%. 
These  results  were  presented  in  Numerisk  Institut  in  Denmark^®  and  in  the  Fourth  Interna¬ 
tional  Colloquium  on  Differential  Equations  in  Bulgaria^^.  These  algorithms  were  combined 
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with  a  numerical  propagator  to  get  a  parallel  special  perturbations  model.  Fukushima®  dis¬ 
cusses  the  round-off  error  reduction  in  the  extrapolation  methods  as  they  apply  to  orbital 
motion. 

Fukushima®  has  suggested  a  numerical  method  based  on  Picard  iteration  and  Cheby- 
shev  polynomials  for  approximating  orbital  motion.  The  idea  is  to  integrate  the  differential 
equation 

^  a<t<h  (9) 

subject  to 

y(io)  =  yo,  o,<to<  h.  (10) 

Assuming  y^^\t)  is  an  initial  guess  for  the  iteration,  then 


=  yo  +  I  f{y^^\s),s)ds,  a  <  t  <  6,  n  =  0, 1, . . .  (11) 

JtQ 

Both  y{t)  and  /  are  expanded  in  Chebyshev  polynomials  of  the  first  kind,  with  coejfficients 
Fj,  Fj,  respectively. 

Given  all  the  coefficients  one  can  compute  This,  in  turn, 

will  give  the  right  hand  side,  4”^^^)-  From  this  one  can  compute  the 

coefficients  and 

Fukushima®  claims  that  “Clearly  the  present  method  is  accelerated  by  using  parallel 
computers.  This  is  because  the  evaluation  of  the  integrand  can  be  done  in  parallel.  Since 
the  computational  time  of  the  numerical  integration  is  mainly  occupied  by  the  integrand 
evaluation,  we  can  expect  a  significant  gain  in  real-time  speed.  In  principle,  the  ratio  of 
speed-up  will  become  as  many  as  the  number  of  processors.”  This  last  sentence  means 
that  the  efficiency  is  close  to  100%.  I  believe  that  there  is  enough  sequential  work  that  will 
cause  degradation  of  the  efficiency,  and  one  should  experiment  with  the  method  on  a  parallel 
machine  to  get  the  actual  efficiency.  Several  private  communications  with  Fukushima  reveal 
that  in  follow  up  papers'^’®,  he  developed  a  vectorization  of  the  method  on  a  Fujitsu  VX/IR 
(with  vector  length  of  2048).  This  vector  method  shows  a  gain  of  more  than  a  1000  times, 
which  is  around  50%  efficiency. 

For  satellite  problems,  y{t)  and  f{y{t),  t)  in  (9)  are  arrays. 


y(0  = 


/(y(0.^)  = 


where  f,  u,  and  a,  are  the  position,  velocity  and  acceleration  vectors,  respectively.  Vallado^® 
discusses  fourth-order  Runge-Kutta  method  (single  step),  as  well  as  Runge-Kutta-Fehlberg 
(variable  step  size)  and  Adams-Bashforth-Moulton  predictor-corrector  (multi-step)  methods. 
To  obtain  fourth-order  or  higher,  one  may  use  extrapolation  with  first-order  Euler’s  method 
or  second-order  Gragg’s  method  as  discussed  above.  Using  N  processors  and  extrapolating 
Gragg’s  method,  one  can  get  2Ar-order  accuracy.  This  idea  is  of  control  decomposition  type. 
It  yields  higher  efficiency  than  the  Picard- Chebyshev  method  advocated  by  Fukushima. 
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4  SPEPH 


In  this  subsection  we  give  the  results  of  our  experimentation  with  SPEPH,  the  special  per¬ 
turbation  code  used  by  AFSPACECOM.  We  are  not  comparing  the  accuracies  attained  by 
various  numerical  propagators.  We  are  only  interested  in  the  computation  time  in  order  to 
assess  the  feasibility  of  control  decomposition.  To  this  end,  we  have  ran  one  example  where 
the  orbit  is  propagated  to  15  minutes  ahead.  It  is  clear  that  the  total  run  time  of  3.19 
seconds  is  too  short  to  use  control  decomposition.  If  we  now  increase  the  length  of  period  of 
propagation  to  3  days  and  15  minutes,  then  the  total  run  time  increases  to  19.1  seconds,  most 
of  it  (over  90%)  is  in  the  subroutine  SPOOX,  the  SP  integration  driver.  Even  this  is  not 
computationally  intensive  enough.  Several  other  orbits  were  propagated  for  various  length 
of  time  and  the  total  run  time  was  always  too  short  to  justify  control  decomposition.  I  am 
sure  that  the  reason  that  SPEPH  is  not  used  on  all  object  is  the  fact  that  one  requires  many 
calls  to  the  propagator  to  accomplish  the  dilferential  corrections  and  update.  Therefore  one 
should  consider  the  domain  decomposition  idea  as  implemented  by  Neal  et  aP'*.  A  more 
radical  solution  is  to  reconsider  the  differential  correction  process  and  see  if  one  can  save  on 
the  number  of  calls. 


5  Conclusions 

It  has  been  shown  that  several  methods  in  the  literature  yield  an  efficincy  of  over  75%  for  the 
solution  of  systems  of  first  order  ordinary  differential  equations  such  as  the  orbit  prediction 
problem.  We  have  access  to  a  special  perturbations  code  currently  in  use  and  our  assessment 
of  the  eflSciency  of  its  control  decomposition  was  discussed. 
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